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ABSTRACT Transposases, enzymes that catalyze the movement of mobile genetic elements, are the most abundant genes in na- 
ture. While many bacteria encode an abundance of transposases in their genomes, the current paradigm is that the expression of 
transposase genes is tightly regulated and generally low due to its severe mutagenic effects. In the current study, we detected the 
highest number of transposase proteins ever reported in bacteria, in symbionts of the gutless marine worm Olavius algarvensis 
with metaproteomics. At least 26 different transposases from 12 different families were detected, and genomic and proteomic 
analyses suggest that many of these are active. This high expression of transposases indicates that the mechanisms for their tight 
regulation have been disabled or no longer exist. 

IMPORTANCE The expansion of transposable elements (TE) within the genomes of host-restricted symbionts and pathogens plays 
an important role in their emergence and evolution and might be a key mechanism for adaptation to the host environment. 
However, little is known so far about the underlying causes and evolutionary mechanisms of this TE expansion. The current 
model of genome evolution in host-restricted bacteria explains TE expansion within the confines of the paradigm that trans- 
posase expression is always low. However, recent work failed to verify this model. Based on our data, we hypothesize that in- 
creased transposase expression, which has not previously been described, may play a role in TE expansion, and could be one ex- 
planation for the sometimes very rapid emergence and evolution of new obligate symbionts and pathogens from facultative 
ones. 
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Transposases are enzymes that catalyze the movement of mobile 
genetic elements in and between genomes and are the most abun- 
dant and ubiquitous genes in nature (1). Most often, transposases 
are part of transposable elements (TEs), which encode only the 
transposase gene and some short flanking sequences necessary for 
transposition. These basic TEs are called insertion sequence (IS) 
elements. Classically, TEs are considered to be selfish genetic ele- 
ments or parasitic DNA with no purpose other than reproducing 
themselves (2-4). However, in more recent years, it has become 
clear that TEs are not always parasites, but can also have beneficial 
effects that increase the fitness of their bacterial host (for reviews 
of the ongoing debate, see references 3-5). TEs (especially IS ele- 
ments) are involved in gene deletions, gene duplications, genome 
rearrangements, gene regulation, and horizontal gene transfer 
(for a review, see references 6 and 7), all of which can have bene- 
ficial effects on the host population by generating genomic diver- 
sity and thus enabling adaptation to environmental changes (8- 
10). However, TEs can also be detrimental if they disrupt 
important functional genes. Therefore, transposase expression 
and, thus, transpositional activity are usually very low (11), be- 
cause the mutagenic effects of transposases would drive their hosts 



into extinction, thereby also eradicating their own existence (12). 
Accordingly, a large variety of mechanisms for the tight regulation 
of transposase expression exists at both the transcriptional and 
translational levels (11). 

TEs and, thus, transposase genes are particularly enriched in 
the genomes of some mutualistic symbionts (here called "symbi- 
onts") and some pathogens that have recently transitioned or are 
transitioning to an obligate, host-associated lifestyle (5, 13). In 
contrast, TEs are absent in the reduced genomes of most obligate, 
host-restricted bacteria that have been associated with their host 
over long evolutionary time periods (TEs absent in 65% of the 
genomes from symbionts classified as obligately intracellular in 
Newton and Bordenstein [ 13] ). There are, however, some ancient 
obligate intracellular bacteria, such as Wolbachia pipientis wMel 
and "Candidatus Amoebophilus asiaticus," that have high num- 
bers of TEs in their genomes (14, 15) (TEs present in 35% of the 
genomes from symbionts classified as obligately intracellular in 
Newton and Bordenstein [13]). 

It has been shown that TE expansion in bacteria in transition to 
an obligate, host-associated lifestyle plays a crucial role in the 
emergence and early evolution of pathogens (8, 16-18) and sym- 
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bionts (19, 20). Currently, there is much uncertainty about the 
factors that lead to high TE loads in host-restricted bacteria, and 
several hypotheses have been put forth (reviewed in reference 3). 
The two main hypotheses, which represent partially opposing 
views, are (i) the selective advantage hypothesis and (ii) the re- 
laxed natural selection hypothesis. In (i), TE expansion is selected 
for because it is beneficial for symbionts and pathogens transition- 
ing to an obligate lifestyle, for example, by providing enhanced 
genomic plasticity for faster adaptation to the host environment 
(8, 9, 21). In (ii), temporary TE expansion in the genomes of 
host-restricted bacteria is due to a reduced effectiveness of natural 
selection against deleterious transpositions (5). The relaxed natu- 
ral selection, according to this second hypothesis, is caused by 
genetic drift due to small population sizes during transmission of 
symbionts from one host generation to the next, and the fact that 
when symbionts reside within a host, many of their genes become 
superfluous and can thus act as neutral integration sites for TEs. 

These two hypotheses can also explain the evolutionary pro- 
cesses involved in TE expansion in ancient obligate intracellular 
bacteria that have a reduced genome, yet still have high TE num- 
bers in their genomes. For these ancient symbionts, the "intracel- 
lular arena hypothesis" explains how TEs first reenter their ge- 
nomes: host switching events bring these symbionts into contact 
with other bacteria during coinfection within the same host and 
thus enable the uptake of foreign TEs (13, 14). These newly ac- 
quired TEs then multiply in the genomes of the ancient symbionts 
either (i) because they confer a selective advantage or (ii) due to 
reduced effectiveness of natural selection in the intracellular envi- 
ronment (5, 22). 

A recent study that tested hypothesis (ii) by subjecting Esche- 
richia colt for 4,000 generations to simulated conditions of relaxed 
natural selection raised doubts as to whether relaxed natural se- 
lection alone can account for TE expansion (21), because no TE 
expansion occurred under the tested conditions. Although Plague 
et al. (2 1 ) noted that "there are several possible reasons why [their] 
experiment may not have adequately tested the [relaxed natural 
selection] hypothesis," they hypothesized that other factors, in- 
cluding increased transposase activity, might have enabled the 
massive TE expansion observed in host-restricted bacteria. Com- 
mon to both hypotheses (i) and (ii) is that they explain TE expan- 
sion in host-restricted bacteria within the confines of the para- 
digm that transposase expression and, thus, transpositional 
activity are always low. 

High numbers of transposase genes were recently described in 
endosymbionts of the gutless marine oligochaete Olavius algar- 
vensis (23). O. algarvensis inhabits shallow water sediments in the 
Mediterranean and lacks both a digestive and an excretory system, 
relying instead for nutrition and waste recycling on a symbiotic 
community of two gammaproteobacterial sulfur oxidizers (yl 
and y3 symbionts), two deltaproteobacterial sulfate reducers (Si 
and 84 symbionts), and a spirochete (23, 24). The symbiotic bac- 
teria cooccur in an extracellular space just below the worm's cuti- 
cle and above the host's epidermal cells where they are in direct 
contact with each other and have access to solutes <70 kDa from 
the environment that can easily diffuse through the worm's cuticle 
(25). Metagenomic analyses of O. algarvensis showed that the yl 
symbiont has a remarkably high percentage of transposases in its 
genome, at nearly 21% of all genes, followed by the y3 symbiont 
with 7.5% and the Si symbiont with 2.3% (23, 26). Nothing is 
currently known about the factors that have led to such high TE 



numbers in some of the O. algarvensis symbionts. We assume that 
the association with the yl symbiont is ancient because these sym- 
bionts occur in many gutless oligochaete species from around the 
world, indicating that the common ancestor of all gutless oli- 
gochaetes may have already had the yl symbiont (25). However, 
the genome of the yl symbiont shows no signs of reduction (23). 
The associations with the y3 and deltaproteobacterial symbionts 
may be more recent, as these symbionts are found in some but not 
all gutless oligochaetes (27); the genomes of these symbionts are 
also not reduced (23). The transmission mode of symbionts in 
gutless oligochaetes has been examined only in two host species 
from Bermuda and appears to be vertical (28). However, occa- 
sional horizontal uptake of bacteria from the environment during 
egg deposition in the surrounding sediment cannot be excluded. 
Thus, the O. algarvensis symbionts could acquire new TEs from 
both environmental sources and cooccurring symbionts. 

The aim of this study was to examine how the high percentage 
of TEs in the O. algarvensis symbiont genomes affects their expres- 
sion. Interestingly, the relationship between high TE loads and 
transposase expression and activity has, to our knowledge, so far 
not been explored. Our metaproteomic analyses revealed that the 
yl and SI symbionts express a surprisingly high number of their 
transposase proteins, many of which are intact and possibly active, 
and we hypothesize that abundant transposase expression plays a 
key role in TE expansion. 

Methods. Worms were collected as described previously (24) 
and either frozen immediately or symbionts were enriched via 
isopycnic centrifugation by layering the worm homogenate on top 
of a HistoDenz (Sigma) multistep density gradient. Symbionts 
were separated from each other and from host tissue by 1 h of 
centrifugation at 4°C as described by Kleiner et al. (24). 

High proteome coverage was achieved, i.e., 2,265 symbiont 
proteins were detected, via automated 24-h two-dimensional liq- 
uid chromatography followed by tandem mass spectrometry (MS/ 
MS) with a hybrid linear ion trap-Orbitrap (Thermo Fischer Sci- 
entific) as described in detail by Kleiner et al. (24). Protein 
databases, peptide and protein identifications, and all MS/MS 
spectra are available from http://compbio.ornl.gov/olavius 
_algarvensis_symbiont_metaproteome/, and detailed data on ex- 
pressed transposases are available from http://compbio.ornl.gov 
/olavius_algarvensis_symbiont_transposases/. 

Transposase numbers and abundance in O. algarvensis sym- 
bionts. We detected the highest number of transposase proteins 
ever reported in bacteria in the yl symbiont of O. algarvensis (22 
proteins) (Table 1). Additionally, we detected two transposase 
proteins in the Si symbiont and two transposase proteins that 
could not be unambiguously assigned to a given symbiont (Ta- 
ble 1). We did not obtain good proteome coverage for the y3 
symbiont due to its low abundance in O. algarvensis and were 
therefore unable to determine if any of its numerous transposase 
genes were expressed. To compare our data to other studies, we 
searched for reports of transposase expression at the protein level 
and found that the highest numbers were found in free-living 
bacteria and not in host-associated bacteria. The highest numbers 
of expressed transposases were seven in a proteome of the cultured 
bacterium Deinococcus radiodurans (29) and eight in a metapro- 
teome of the uncultured Lepto spirillum group II bacterium from 
acid mine drainage (30). To our knowledge, transposase expres- 
sion has so far not been examined at the protein level in symbionts 
or pathogens with high TE numbers in their genomes. 



2 mBio' mbio.asm.org 



May/June 2013 Volume 4 Issue 3 e00223-13 



Abundantly Expressed Transposases in Symbionts 



TABLE 1 Overview of all expressed transposases grouped according to shared peptide matches 
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" Accession numbers refer to the JGI IMG/M database (http://img.jgi.doe. gov/cgi4}in/m/main.cgi). Sequences on unassigned metagenome fragments can be found at http: 

//compbio. ornl.gov/olavius_algarvensis_symbiont_metaproteome. 

b See Table S4 in the supplemental material for details on grouping. 

c IS element family, according to IS Finder. Cutoff amino acid identity of >30%. 

d Similar length and identical Pfam domain structures compared to homologous transposases. Some of the transposases that were identified as not intact may actually be intact and 
only appear to be fragmented due to the incomplete nature of the symbiont metagenome. 
e Identified with an unassigned symbiont metagenome fragment. 
f Only very short contigs available. 

£ This number is bigger than the actual number, 134, because one transposase fell into two groups. 



Transposases comprised up to 1.95% of the total yl symbiont 
protein (see Table SI in the supplemental material) and up to 
0.084% of the total 81 symbiont protein (Table S2). The abun- 
dance of transposases (as estimated from label-free proteomic 
quantitation) in the yl symbiont was greater than that of some of 
its most abundant housekeeping proteins, such as the ATPase B 
subunit (1.15%), malate dehydrogenase (0.38%), and 
6-phosphofructokinase (0.35%) (24). Within the context of nat- 
ural microbial communities, highly abundant transposase protein 
expression has, so far, only been reported from a microbial biofilm 
in acid mine drainage; however, relative abundances were esti- 
mated at less than half of the amounts observed here (30). 

Since many transposase genes are present in multiple, nearly 
identical copies in the symbiont metagenomes (23, 26), the 26 
groups of transposase genes (Table 1) could be encoded by as 
many as 134 transposase gene sequences in the metagenome (see 
Table S3 in the supplemental material). Given that proteins en- 
coded by almost-identical sequences with identical tryptic pep- 
tides cannot be distinguished from each other by mass 
spectrometry-based proteomic analyses, we were not able to iden- 
tify which of the 1 34 transposase genes were expressed and if mul- 
tiple copies of identical genes were expressed. Therefore, we as- 



signed the transposases that were identified with similar sets of 
peptides to 12 different groups (Table 1; see also Table S4 in the 
supplemental material) and used this grouping to identify the 
minimal set of transposases that must be expressed to explain all 
transposase-related peptides. This nonredundant set consisted of 
the above-mentioned 26 transposases. 

We classified the 26 nonredundant transposases into IS ele- 
ment families using BLASTp against the curated IS Finder data- 
base (31) (http://www-is.biotoul.fr/) and found that they belong 
to at least 10 different IS element families in the yl symbiont and 
2 families in the 5 1 symbiont (Table 1 ) . This clearly shows that the 
expressed transposase genes originated from multiple unrelated IS 
elements. 

Could abundant transposase expression be caused by stress? 

Previous studies have reported increased transposase expression 
in response to stressful conditions (29, 32). However, relative 
abundances were much lower than those detected in our study. To 
exclude the possibility that transposase expression in the O. algar- 
vensis symbionts was caused by stressful conditions during the 1 
hour long symbiont enrichment procedure, we did a qualitative 
comparison of their proteomes with those of symbionts that were 
frozen in whole worms immediately following removal from the 
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sediment. We also measured high levels of transposase expression 
in these immediately frozen symbionts (see Tables S 1 and S3 in the 
supplemental material). Thus, we conclude that the observed 
transposase expression is not due to stressful sampling and enrich- 
ment conditions but rather reflects expression under natural en- 
vironmental conditions. 

Are the expressed transposases active? We inferred that some 
of the expressed transposases are active by excluding the two main 
reasons for potential inactivity: (i) transposase genes could be in 
the process of gene degradation so that their expression would 
lead to incomplete and potentially inactive transposases and (ii) 
the expression of some transposase genes is regulated through 
programmed translational frameshifting, which would lead to the 
translation of a truncated, nonfunctional version of the trans- 
posase protein (6, 11, 33). 

First, we checked for indications that the expressed transposase 
genes were intact rather than in a state of gene degradation. We 
compared the gene sizes and protein sequences to those of closely 
related transposases in the IS Finder database and compared their 
protein domain structures with the domain structure of similar 
transposases using the "domain organization" feature available on 
Pfam (http://pfam.sanger.ac.uk/search). We found that around 
half of the expressed transposase genes were intact, whereas for the 
other half, we were not able to exclude the possibility that they 
were in some stage of degeneration (Table 1). Second, we checked 
the literature and found that only the IS families of four out of the 
26 detected transposases are known to be regulated by pro- 
grammed translational frameshifting (11) (Table 1). Therefore, 
we assume that the majority of the expressed transposases are 
translated to full-length proteins. 

Additional evidence for the expression of full-length trans- 
posase proteins comes from our proteomic data. For some of the 
identified transposases, we detected peptides not only in the be- 
ginning part of the protein but also in the middle or at the end of 
the protein, a finding which indicates that the protein was trans- 
lated from beginning to end (http://compbio.ornl.gov/olavius 
_algarvensis_symbiont_transposases/). 

Conclusion. Our results show that the paradigm that trans- 
posase expression in bacteria must be tightly regulated and gener- 
ally low to prevent the host population from going extinct does 
not always hold true. We present evidence at the protein level that 
transposases are abundantly expressed in beneficial symbionts 
with high TE numbers in their genomes. This high expression of 
transposases indicates that the mechanisms typically found in bac- 
teria for their tight regulation have been disabled or no longer 
exist, for example, possibly through mutations in proteins that are 
involved in transposase regulation (20, 34, 35). The fixation of 
such mutations may be enabled by the relaxed purifying selection 
suggested by Moran et al. for symbionts and pathogens that re- 
cently transitioned to an obligate, host-associated lifestyle (20). 

Currently, it is not possible to determine if abundant transposase 
expression is present in other symbionts and pathogens with high TE 
numbers, because no comparable proteomic datasets exist for these 
bacteria. However, many recent studies have shown high transcrip- 
tion of transposase genes in symbionts and pathogens (15, 36^10), 
which may indicate that abundant transposase expression is common 
in these bacteria. However, the presence of these transcripts does not 
represent conclusive evidence for transposase expression because it is 
possible that they are not translated into proteins due to regulatory 
mechanisms (11, 15, 37). 



Based on the high numbers of transposase proteins in the O. al- 
garvensis yl symbiont and abundant transposase transcription in 
other symbionts and pathogens, we speculate that high trans- 
posase expression is one of the key factors in TE expansion in 
host-restricted bacteria. As discussed above, an experiment that 
simulated conditions of relaxed natural selection failed to cause 
TE expansion after 4,000 generations in E. coli. If, as we speculate, 
high transposase expression is the major catalyst for TE expan- 
sion, the mutations that lead to increased transposase expression 
might have simply not occurred yet in the Plague et al. (2 1 ) exper- 
iment. Additional studies that investigate transposase expression 
in pathogens and symbionts are needed because high transposase 
expression may be an important factor in the sometimes very 
rapid emergence and evolution of new obligate symbionts and 
pathogens from facultative ones. 

The remarkably high expression of transposases in the O. al- 
garvensis yl symbiont raises the question of how the symbiont can 
function over evolutionary time periods given the likelihood that 
high transposase expression leads to an increase in deleterious 
mutations. In other organisms that are confronted with frequent 
transpositions, genome rearrangements and disruptions, it has 
been suggested that polyploidy buffers against the detrimental ef- 
fects of the factors that lead to these genome disruptions such as 
transposable elements, introns, heat and ionizing radiation (41- 
44). Polyploidy has recently been shown for several symbionts, 
including sulfur-oxidizing symbionts of clams (45), and it is pos- 
sible that it also plays a role in the O. algarvensis yl symbiont. 

To gain a better understanding of how the O. algarvensis yl 
symbiont deals with high transposase expression, it would be cru- 
cial to know specifically how high the transpositional activity of 
the abundant transposases is. Does it lead to transpositions in 
every second symbiont or in every millionth? Do the symbionts 
have multiple genome copies that buffer against the deleterious 
effects of transposase activity? To answer these questions, we are 
currently sequencing the genomes of single symbiont cells to ex- 
amine how common transposition events are within the same 
individual host, within the host population, and between host 
populations. 

SUPPLEMENTAL MATERIAL 

Supplemental material for this article may be found at http://mbio.asm.org 
/lookup/suppl/doi:10.1128/mBio.00223-13/-/DCSupplemental. 

Table SI, XLS file, 0.1 MB. 

Table S2, XLS file, 0.1MB. 

Table S3, XLS file, 0.1MB. 

Table S4, XLS file, 0.1MB. 
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